Latency in local, two-dimensional, fault-tolerant quantum computing 
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We analyze the latency of fault-tolerant quantum computing based on the 9-qubit Bacon-Shor 
code using a local, two-dimensional architecture. We embed the data qubits in a 7 by 7 array of 
physical qubits, where the extra qubits are used for ancilla preparation and qubit transportation 
by means of a SWAP chain. The latency is reduced with respect to a similar implementation using 
Steane's 7-qubit code Furthermore, the error threshold is also improved to 2.02 x 10"^, when 
memory errors are taken to be one tenth of the gate error rates. 



I. INTRODUCTION 



. The fragility of quantum information has always been one of the most important obstacles for the development 
of practical implementations of quantum computing The theory of fault-tolerant quantum computation was 
t , developed to deal with the problem of computing in the presence of errors, even when the quantum gates required in 
' this process are not perfectly reliable Q. One of the most important achievements in the field, the threshold theorem, 
assure us that an arbitrarily long quantum computation can be performed as long as the error rates of the faulty 
f~i ■ gates arc below a certain error threshold (3]. 

Qh' The price of dealing with errors and faulty gates is paid in terms of the space and time overhead of the imple- 

■ mentation. To reduce the error rates of encoded gates, a quantum error-correcting code is concatenated with itself 
C ' (or other code), requiring more space and time to perform a reliable encoded gate. On top of this, many physical 
2 . implementation impose more constraints on the design of fault-tolerant circuits. The most important is the locality 
qh' of interactions, which forces us to move qubits next to each other whenever we want them to interact through a 

I— li quantum gate. We can see that the characteristics of each implementation will be affected by its underlying architec- 
ture. Fault-tolerant designs have been studied that use different error correcting codes 0, Q and consider different 
" \ constraints and architectures P, 0] . 

I For many possible implementations of quantum computing (such as solid state, ions in optical lattices, supercon- 
I ' ducting qubits) the architecture is not only local but also restricted in dimensionality. In particular, a two-dimensional 

, architecture seems appealing, since we want to be able to manipulate single qubits using classical controls, and a pla- 

' nar architecture will leave us room to do that. This constraint, together with the locality of interactions will clearly 
ly-^ , affect the latency of the computation, since a large amount of qubit transport will need to be accomplished. It is then 

• important to study how to optimize this transport in order to reduce the time overhead. In this paper we analyze 
OO ! this problem for a local, two-dimensional architecture. Our approach is to choose the error correction code and the 

■ implementation of encoded gates in order to minimize the time required, while at the same time trying to keep the 
^ I space overhead as small as possible. 

• , The paper is organized as follows. In Section[n]we begin by discussing the basic ideas of fault-tolerance and identify 
^\ ' the error-correction procedure as the main contribuitor to the latency of concatenated implementations. In Section 
mil we discuss the properties of the 9-qubit Bacon-Shor error-correcting code @, Q . In Section [IV] we take a look at 
the very useful properties of the fault-tolerant error correction procedure for the 9-qubit code. In Section |V] we show 
how to implement fault-tolerant encoded gates using this code in a local, two-dimensional architecture, paying special 
attention to the latency of the encoded gates. In Section IVll we compute the error threshold for the Clifford-group 
gates. Finally, in Section [VIII we discuss the results and present our conclusions. 



II. LATENCY IN FAULT-TOLERANT CIRCUIT DESIGN 



Quantum encoding and fault-tolerant circuit design are the two main tools used in dealing with errors that occur 
during the operation of quantum gates in the implementation of a quantum circuit. By encoding quantum information 
in the state of several qubits we can make it more robust to the effects of errors. If these errors can be detected and 
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corrected, the encoded quantum information can be preserved. However, since the error detection and correction must 
be implemented with the same faulty gates, especial care must be put into preventing possible errors from propagating 
too much during the computation. Fault-tolerant design deals exactly with this problem. 

While encoding reduces the logical error rate of the computation, this reduction might not be enough to allow the 
desired computation to be preformed with a high probability of success. To reduce this error rate even further, the 
encoding can be concatenated with itself (or some other encoding) . Concatenation is very powerful in supressing the 
error rate. If ephys is the physical error rate, then the logical error rate is given by 



where k is the number of levels of concatenation, and eo is the error threshold. We can see that if the physical error 
rate is below the error threshold, the logical error rate is suppressed supcrcxponentially. The error threshold depends 
on the details of the encoding and its implementation. 

The price of using concatenation to reduce the error rate is paid as an increase on the size of the circuit, both in 
depth and width. This can be seen from the canonical precedure for implementing a given quantum circuit in a faul- 
tolerant way at each level of concatenation. To do this we replace each physical qubit by an encoded block, and each 
gate by an encoded gate. On top of that, after every encoded gate we perform error correction on the encoding block. 
Both the encoded gates and the error correction procedure must be fault-tolerant themselves, to prevent errors from 
propagating to other blocks before they can be corrected. This procedure generates a self-similar structure as can be 
seen in Figure 1. In this paper we arc interested in studying the impact on latency (or circuit depth) of concatenation. 



in particular when other constraints are present. Let Lec be the latency of the error correction routine (measured in 
number of time steps required) applied to every block. Since error correction is performed after every encoded gate, 
if the circuit has k levels of concatenation, the blowup factor with respect to the latency of the unencoded circuit will 
be at least L^^c- Other contributions to this factor will be given by the latency of the encoded gates. But for many of 
the most used quantum codes, encoded gates can be performed transversally (requiring only one time step) or require 
at most a few time steps. Thus, the latency of the implementation is most heavily influenced by the latency of the 
error correcting routine. Finding fast (in terms of latency) error correcting procedures becomes then the key element 
in trying to reduce the overall latency. 

Let us first take a look at the structure of error correction. There arc some basic steps that any procedure that 
corrects errors must follow. First, we need to prepare a suitable ancilla state. This requires preparing single qubits 
in a certain states, such as the eigenstates of one of the Pauli operators. This will require at least one time step. 
The ancilla state will also require entanglement, and hence we need at least another time step to apply the two-qubit 
operations required to produce it (we might need more than just one time step). Once the ancilla has been prepared, 
we need to make it interact with the qubits in the encoding block. Again, this process takes at least one time step, 
but may in general take more (for CSS coded, the data qubits interact with ancilla qubits twice to detect X and Z 
errors respectively.) Finally, the ancilla has to be measured to extract the error syndrome, which requires another 
time step. The actual error correction can be postponed by updating the Pauli frame, as long as the gates we apply 
belong to the Clifford group. 

We can see then that performing error correction will require at the very least 4 time steps for any implementation. 
Hence, to improve latency we need to look for quantum codes that allow fault-tolerant implementations of error 
correction to run in as few time steps as possible. And we have given a sort of benchmark for what "few time steps" 
means. But we are also interested in studying this problem under other real-world constraints, like locality and a 
two-dimensional architecture. We will show an implementation with exactly those constraints that can perform error 
correction in 7 time steps. 




(1) 



FIG. 1: Recursive structure of concatenated implementations of quantum error correction. 
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III. THE BACON-SHOR NINE-QUBIT CODE 



Our implementation will take advantage of some of the useful properties of the Bacon-Shor 9-qubit code. We will 
only give a basic description of this code here, since a more detailed description can be found elsewhere [1, [^. 

The Bacon-Shor 9-qubit code is a stabilizer CSS code that encodes one logical qubit into nine physical qubits. The 
distance of this code is 3, so using the standard notation it is a [[9,1,3]] code. To describe the properties of this code 
it will be useful to consider the nine physical qubits as if they were placed on the vertices of a 3 x 3 lattice as seen in 
Figure 2. 




FIG. 2; Arrangement of data qubits for the 9-qubit Bacon-Shor code. 



The code can be defined by the stabilizer group S generated by the stabilizer operators 

51 — X1X2X3X4X5XQ 

52 — X4X5XeXTXsXg 

53 — Z\Z2Z4Z'^ZtZ^ 

5*4 — Z2Z3Z^ZgZ^Z<^, (2) 

where Xi and Zi represent the usual Pauli operators applied to the i — th qubit. Operators and ^2 correspond to 
X operators applied to the first and second row of qubits, and the second and third row, respectively. In a similar 
manner, S3 and 15*4 correspond the Z operators applied to the first and second column of qubits, and the second and 
third column, respectively. 

The different syndromes (i.e., the vector of eigenvalues of the four opeators in ^) induce a decomposition of the 
Hilbert space of the nine qubits in the code block into subspaces encoding actually five logical qubits. In each one of 
these subspaces we can define a subsystem decomposition and write 

^= C^l^-Ht), (3) 

syndromes 

where the direct sum is over all possible syndromes, Hl '^n the Hilbert space of the logical qubit that will be fully 
protected by the code, and Tir is the Hilbert space of the remaining encoded qubits (which will not be fully protected). 
The logical operators associated with the encoded qubit are given by 

Xl = X 1X2X3 

Zl = ZiZiZr. (4) 

We can see that Xl corresponds to X operators acting on all the qubits on the first row of Figure [5] while Zl 
corresponds to Z operators acting on the first column. It si easy to check that these logical operators commute with 
the stabilizer generators. The logical operators for qubits encoded in Ht can be chosen from the nonabclian group 

T — {X1X4, XiXr, X2X5, X^Xs, XsXg, XqXq, 

Z1Z2, Z2Z3, ^4^5, .^5^61 ^7^81 ZsZg). (5) 

It is easy to see that any operator in T commutes not only with the stabilizer generators, but also with the logical 
operators in This leads to one of the key features of this code (and of subsystem eoncoding in general): the 
state of the logical qubit is not uniquely encoded. Once this state has been encoded in the nine qubit block, applying 
any operator in T does not affect the encoded quantum information, since these operators commute with the logical 
operators X^ and Z^. This simplifies error correction and makes some two-qubit errors actually trivial. 
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IV. FAUL-TOLERANT ERROR CORRECTION 



Since the 9-qubit Bacon-Slior code is a CSS code, we can apply the techniques introduced by Steane to per- 
form fault-tolerant error correction. This requires preparing the logical state |0), a simultaneous eigenstatc of 
stabilizer generators and the logical operator with eigenvalues -1-1. As we pointed out before, this does not 
completely determine the encoded state, since we can apply any operator in the set T without perturbing the en- 
coded information. This ambiguity can be fixed by requiring the encoded state to be an eigenstate of the operators 
X1X4, X^Xt, X^Xs, X^Xq, X^Xg (which belong to T) with eigenvalue -1-1. It is not difficult to check that the 

only state satisfying these constraints is given by 

|0) = -i=(| + ++)i47 + I )l47)(| + + + )258 + I )258)(| + ++)369 + I )369), (6) 

where |±) = -^(|0) ± |1)), as usual. Geometrically, this state corresponds to the tensor product of three "cat" states 
in the Hadamard rotated basis, each one comprising the qubits in each column in Figure [H Similarly, it is not difficult 
to see that the encoded |+) state corresponds to three "cat" states in the computational basis, each one now lying 
across each one of the three rows of qubits in Figure [H This geometric description will be very useful when we look 
at how to prepare and use these states in a local, two-dimensional architecture. 

To extract the error syndrome in the Steane fault-tolerant error correction scheme, we need to apply the circuit in 
Figure [31 To look for Z errors on the date block, an ancilla block encoded in the |0) state interacts with the data 



Encoded 
data 



|0> 



I0>^ 



X 



FIG. 3: Quantum circuit for Steane's fault-tolerant syndrome extraction. 



block through an encoded CNOT gate (which for this code can be implemented transversally.) The qubits in the 
ancilla block are then measured in the X basis, and the results of these measurements are used to classically compute 
the error syndrome. But before the ancilla block interacts with the data, it needs to be verified using an extra ancilla 
block also encoded in the |0) state, to check for possible X errors in the ancilla block that could propagate to the 
data. To make the procedure fault-tolerant, this verification step needs to be carried out three times, and the ancilla 
block is accepted if and only if no more than one verification step failed. A similar scheme is used to check the data 
qubits for X errors. 

The ancilla verification state is the main contributor to the latency of the error correction procedure, since we 
need to construct a verification block at least two times before allowing the ancilla block to interact with the data. 
Fortunately, Aliferis and Cross [l^l showed that for the 9-qubit code using Steane's procedure the verification step 
is not required. This property can be somewhat traced back to the fact that the encoded states |0) and |+) break 
down into three entangled "cat" states of only three qubits each, and these "cat" states are also rather robust against 
pairs of errors that could be introduced by a faulty gate during their preparation. This decomposition into a tensor 
product of states with less qubits also makes the preparation process efficient in terms of latency. 

The circuit for syndrome extraction takes then a very simple form, as can be seen in Figure 2) From this figure we 
can see that the latency for the error correction routine is 6 time steps. Preparing the ancilla requires 3 time steps: 
one for singie-qubit preparations and two to entangle them to form a "cat" state (or its Hadamard rotated version). 
The interaction with the data block through a transversal CNOT can be accomplished in one time step and so can 
the final ancilla measurements. Even though each syndrome extraction can be accomplished in 5 time steps, we need 
to delay one of them by 1 time step since the two ancilla blocks cannot interact with the data at the same time, and 
this gives us a total latency of 6 time steps. And as we discussed before, instead of actually correcting an error we 
will only update the Pauli frame. This can be done as long as we apply gates that belong to the Clifford group, since 
they preserve the tensor product form of the errors. We will see that we can relegate non-Clifford gates to the higest 
level of concatenation (i.e., the algorithmic level), so we can run all the lower levels without physically performing 
Pauli operations to correct the errors. 
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FIG. 4; Fault-tolerant circuit for syndrome extraction for the 9-qubit Bacon-Shor code, introduced by Aliferis and Cross. 



V. LOCAL, TWO-DIMENSIONAL ARCHITECTURE 



The fault-tolerant error correction procedure described above relies on an abstract architecture in which there is 
no restriction on the interaction between qubits. A two-qubit gate can be applied to any pair pair of qubits in one 
time step. In reality, a physical implementation of quantum computing will usually impose a locality constraint that 
requires that two interacting qubits are actually next to each other. Then, to implement an abstract quantum circuit 
we will need to introduce an information transport process that is based on local interactions. The simplest way of 
doing that is to use a chain of SWAP gates, in which the states of two neighboring qubits is exchanged. However, we 
need to be careful if we want to preserve the fault-tolerance of the the scheme, since a faulty SWAP might introduce 
two errors in one block if it is applied to two data qubits. 

Besides the locality of the interaction, as a matter of practical design it is advantageous to lay our qubits in some 
kind of two-dimensional array. This would allow the unavoidable classical control signals to have easier access to each 
individual qubit in order to perform necessary operations like single qubit preparation and measurement, and single 
qubit gates. Then the problem becomes how to design a fault-tolerant error correction procedure that minimizes 
the impact on latency of the locality constraint on a two-dimensional architecture, trying to keep the space overhead 
under control and without lowering too much the error threshold. 

In [ij, Svore, DiVincenzo and Terhal studied this type of architecture, although their main objective was to compute 
the error threshold for that implementation. Also they used Steane's 7-qubit code to encode the quantum information 
and applied Steane's scheme for error correction. They embedded the seven data qubits in a 6 x 8 array of physical 
qubits. The extra qubits where used for ancilla preparation and measurement, and for qubit transport by means of 
SWAP gates. We will consider a similar type of implementation based on the 9-qubit Bacon-Shor code. 

Our implementation consists of embedding the nine data qubits of the 9-qubit code in a 7 x 7 array of physical 
qubits. During error correction, the data qubits are located at the positions shown in Figure [51 The remaining qubits 



o o o o o o o 

O dl O d2 O dS O 

O O O O O O O 

O dA O dr:, O d6 O 

O O O O O O O 

O d7 O d8 O d9 O 

O O O O O O O 



FIG. 5: Encoding block: the data qubits are embeded in a 7 by 7 array of physcical qubits. The remaining "dummy" qubits 
(noted by the letter O) are used for ancilla preparation and qubit transportation. 



(represented by O) act both as the workspace in which ancillas are prepared and measured, and as the "rails" on 
which data and ancilla qubits are transported using chains of SWAP gates. 
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A. Error Correction 



As it can be seen in Figure |4l to perform error correction we need to prepare three "cat" states, each interacting 
with one (and only one) of the rows of data qubits in Figure [51 and three Hadamard-rotated "cat" states to interact 
with the three columns of data qubits. For the first and last row (column), we can use the external rows (columns) in 
Figure [5l to prepare the required ancilla and move some of these ancilla qubits next to the corresponding data qubit. 
Each of these parts of the error correction can be performed idependently of one another. But for the ancilla states 
that interact with middle row and column, we need to carefully time their preparation, movement and interaction 
with the data, because they make use of the same workspace. This additional complication (due to the locality and 
two-dimensional character of this implementation) increases the latency of the error correction procedure with respect 
to the abstract case by only one time step. 

In Figure [6l we present two snapshots of the error correction procedure (the complete sequence is presented in the 
appendix). The data qubits are denoted by dl, d2, etc., and remain fixed during the error correction procedure. The 
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FIG. 6: Two snapshots of the error correction procedure. 



ancilla qubits (denoted by al, a2, etc.) are prepared and transported using the workspace of dummy qubits (denoted 
by O.) The preparation of ancilla qubits is noted by Px,zicij), and represents the preparation of the qubit in the 
eigenstate of either X or Z, with eigenvalue 1. Note that not all ancilla qubits are prepared at the same time, in order 
to reduce the number of memory error locations. The single arrows represent CNOT gates, pointing from control 
to target qubits. The double-headed arrows represent SWAP gates. In time step 2 we can see the second step in 
the preparation of all the required cat states. In time step 4 we can see the data qubits interacting with the ancilla, 
some ancilla qubits still moving via SWAP gates to get in position, and one ancilla qubit being measured (noted by 
Mx,z().) 
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B. Encoded gates 



For the purpose of concatenation we need to be able to perforin both the encoded CNOT and the encoded SWAP 
gate. The encoded SWAP gate is very simple, since it only involves moving the data qubits from one encoding block 
to another. This is done using SWAP gates. It is clear that the latency of the encoded SWAP gate is only 7 time 
steps. 

For the 9-qubit code, the CNOT gate is transversal. Then we only need to move the data qubits so that the 
corresponding qubits of two neighboring blocks are next to each other, and then apply a single-qubit CNOT between 
each pair. The most efficient way of doing this, from the point of view of latency, is to interleave the rows (or columns) 
of the two blocks. To do this, we first move all the data qubits in one block up one row (or left one column), while 
the data qubits on the other block start moving laterally (or vertically) towards the first block. Then we keep moving 
tha data qubits towards each other, interleaving the rows (columns) until the corresponding data qubits are next to 
each other. Thn we apply a single-qubit CNOT between each pair, and move all data qubits back to their original 
positions. In this way, that latency of the encoded CNOT is only 9 time steps. 

The encoded Hadamard gate can also be easily implemented. As noted in , the encoded Hadamard is equivalent 
to applying single-qubit Hadamards to all physical qubits and rotating the 3x3 array of qubits by 90 degrees. In our 
implementation, this rotation can be accomplished by transporting the data qubits in the perimeter of the array in 
Figure [5] (i.e., all qubits except d5) along that perimeter until the rows are transformed into the columns of the array. 
This transport can be done in 4 time steps, since we can move all qubits at the same time. Then, the total latency of 
the encoded Hadamard gate is only 5 time steps: one to apply single-qubit Hadamards and four to rotate the array. 



C. Encoded preparation and measurement 



Finally, we need to be able to fault-tolerantly prepare the encoded |0) and |+) states, and fault-tolerantly measure 
the encoded operators X and Z. Since the 9-qubit code is a CSS code, the fault-tolerant measurement reduces to 
measuring the single-qubit operator X (or Z) for each data qubit in the encoding block and classically post-processing 
the outcomes (as is usual, we assume that classical processing can be done flawlessly.) 

For the fault-tolerant preparation of encoded states, we will use a procedure that applies to any CSS code, and is 
illustrated in Figure[7]for the encoded state |0). This scheme requires that we prepare four copies of the encoded state. 



|0>^ 
10)^ 



|0>^ 
10)^ 



X 



FIG. 7: Fault-tolerant preparation of the encoded |0) state. 



As discussed above, the state \0) is equivalent to three "cat" states in the Hadamard-rotated basis, each comprised 
of the qubits in one of the columns of data qubits in Figure El To prepare the four copies we need to use 36 physical 
qubits. We can choose them to be the 6x6 array obtained when we subtract the first row and column from the 7x7 
array. We can use 3 rows and 6 columns to prepare two copies of |0), with their columns interleaved (giving us the 4 
copies needed in the 6x6 array.) Once these copies are prepared we can apply encoded CNOT gates on each pair 
of encoded states, which takes only one time step. Then we measure two of the copies, and then move the qubits in 
the columns vertically using SWAP gates in order to position them next to the corresponding qubit in the remaining 
copy. We then apply another CNOT gate and measure one of the copies. The remaining qubits encode the state |0). 
This preparation procedure requires 9 time steps. A step-by-step pictorial representation of this preparation can be 
found in the appendix. The encoded state |+) can be prepared in a similar way, interleaving rows instead of columns. 
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Gate 


9-qubit 


7-qubit 


CNOT 


16 


35 


SWAP 


14 


35 


|0)-prcparation 


16 


41 


|+)-prcparation 


16 


41 


Measurement 


1 


1 



TABLE I: Latencies (in number of time steps) of encoded gates for the 9-qubit and 7-qubit codes. 



D. 1-Recs and latency 



A 1-Rectangle (1-Rec) is formed by an encoded gate followed by error correction on all the encoded blocks involved 
in the gate (with the exception of the 1-measurement, that requires only classical post-processing.) We can summarize 
the above results in the following table showing the latency in time steps, of each 1-Rec, for both the 9-qubit code 
discussed here and the 7-qubit code presented in The longest 1-Rec is the one corresponding to the preparation 
of encoded states. However, since these preparations are always performed on blocks representing dummy qubits that 
are free during the previous time step, we can always start these preparations ahead of time, reducing their effective 
latency. Thus, the lower limit for speed in this implementation is given by the CNOT 1-Rcc, that requires 16 time 
steps. This produces a reduction in latency with respect to the 7-qubit code implementaion in [l| by a factor of 
{16/35)'' ^ 0.45*^, with k the number of concatenation levels. 



E. Universality 

Universality of this scheme can be achieved by supplementing the Clifford-group gates implemented with non- 
Clifford gates, such as the phase gate S and its square root the T gate. Since the 9-qubit code docs not allow 
for a sim ple, fault-tolerant implementation of these gates, we need to use a different approach. One possibility, 
following [lO[ is to purify noisy encoded | + i) states using only Clifford-group gates (which are virtually error free at 
high concatenation levels) , and later use the purified encoded | -I- i) state to apply the S gate with higher reliability. 
This can be done at any concatenation level, using the injection by teleportation technique developed by Knill [llj 
to insert a noisy encoded | -|- i) state at the required level. This approach is also used in [1| to improve the impact on 
the threshold of preparing the ancillas required to implement the S and T gates. There it is shown that the threshold 
for the preparation of these ancilla states is higher than the threshold for Clifford gates. That analysis also applies 
to the 9-qubit code (the only difference being the number of gates involved in the decoding during the injection by 
teleportation, and that number is comparable for both codes.) Once we are able to apply S we can use it to generate 
the ancilla state required to apply T. These preparations can be done offline, since these gates are only required at 
the algorithmic level. 



VI. ERROR THRESHOLD 

We compute the error threshold for the case of stochastic, adversarial noise. Following the procedure presented 
in and also used in we compute the number of malignant pairs of locations for the CNOT extended rectangle. 
There are 7 different types of locations: (1) |-|-) preparation, (2) |0) preparation, (3) memory, (4) SWAP gate, (5) 
Z measurement, (6) X measurement, and (7) CNOT gate. The error probability for the A;*'' level CNOT extended 
rectangle is given by 

i<j 

where e.P is the error rate of a gate of type i at level and emax^ is the maximum of all the error rates (which in 
our case is e^,*" The second term on the RHS of ([7]) bounds the contribution of third and higher order terms, with 
/ 791 \ 

B = \ , where 791 is the number of locations in the CNOT extended rectangle. The coefficients aij represent 
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the number of malignant pairs of locations of each kind, and are given by 
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(8) 
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If we take all error rates to be equal, the total number of malignant pairs is A = 75, 880. Going back to Eq. ([7]) we 
can write 



with A' = + + = 76,948. This gives a lower bound for the accuracy threshold of 1.3 x 10"^ If 



we assume (as it is usually done) that the memory errors are one tenth of the other errors, the threshold becomes 



In this paper we have analyzed the implementation of fault-tolerant quantum computing in a local, 2-dimensional 
setting. In particular, we studied the effects on the latency of the computation that such constraints will have. 
Locality requires that any pair of interacting qubits have to be placed next to each other before the interaction can 
take place. This is accomplished by applying a chain of SWAP gates, between a qubit and one of its neighbors, until 
the two qubits are side by side and the required local logical gate can be applied. This clearly affects the latency of 
the computation. The fact that this movement is restricted to a 2-dimensional setting will also impact the latency 
when several moving qubits have to be maneuvered to avoid each other. 

Another issue is that our best approach to implement fault-tolerant quantum computing, through concatenation of 
error-correcting codes, yields an exponential increase in latency with the number of concatenation levels. This is not 
as bad as it may seem, since we would only need a finite number of concatenation levels. But from a practical point 
of view it will translate into huge latency overheads. 

The answer to minimizing the latency increase due to concatenation rests on a clever choice of the error-correcting 
code. Since concatenation requires the application of error correction after every single encoded gate at every encoded 
level, it is clear that an encoding with a fast (in terms of latency) error-correction procedure is highly desirable. 
And because we are restricted to local operations, and this need of transporting qubits is at the heart of the latency 
increase, it is reasonable to look at small codes that can be embeded in a compact 2-dimensional architecture that 
minimizes qubit transport. This is the setting in which the 9-qubit code has shown to posses remarkable properties 
that work to our advantage. 

Furthermore, a simple analysis of error correction shows that the latency of this process has a lower bound. To 
perform error correction we need to prepare and entangle ancilla qubits, make them interact with the data and measure 
them. All these steps are unavoidable and translate into a lower bound for the latency of the concatenated circuit. 
The 9-qubit code property of not requiring ancilla verification is the origin of its short latency. It is probably as good 
as we can get it to be. And implementing it in a local, 2-dimcnsional architecture only increases the latency from 6 
to 7 time steps. This results in a very fast and compact implementation of the encoded CNOT 1-Rcc. Compared to a 
similar implementation [l[ that uses Steane's 7-qubit code, the 9-qubit code improves the latency of the CNOT 1-Rec 
(encoded gate plus error correction) by a factor of 0.45, which translates into an improvement in overall latency of 
(0.45)'"', with k the number of concatenation levels. 

The 9-qubit code implementation also improves the accuracy threshold for Clifford-group gates (the only ones 
needed for error correction.) If we take all errors to have the same rate, the thresold becomes 1.3 x 10"^, compared to 
1.1 X 10~^ for the 7-qubit code. If we take memory errors to have a lower error rate (one tenth of other error rates), 
we obtain a threshold of 2.02 x 10~^ (1.85 x 10~^ for the 7-qubit code.) The threshold increase is modest, but the 
latency increase is sizable. Furthermore, the space overhead is essentially the same (49 qubits in the encoding block 
for the 9-qubit code versus 48 qubits for the 7-qubit code), but the 9-qubit implementation has the added feature of 
treating encoded gates the same way whether the encoded blocks are horizontally adjacent or vertically adjacent, as 




(9) 




2.02 X 10-^. 
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opposed to the 7-qubit code whose 6 by 8 basic array of physical qubits necessarily needs to treat these two operations 
differently. 

Universality can be achieved by supplementing the Clifford-group gates implemented, with non-Clifford gates such 
as the phase gate S and its square root the T gate. Since the 9-qubit code does not allow for a simple, fault-tolerant 
implementation of these gates, this is accomplished by the preparation of special ancilla states that can be used to 
apply the required gates. Even though the preparation of these ancilla states is not fault-tolerant, we can combine the 
techniques of injection by teleportation and distillation to prepare the required states with high fidelity at the desired 
level of concatcntation. These preparations can be done offline, since these gates are only required at the algorithmic 
level. How to integrate these preparations into a 2-dimensional architecture in an efficient manner is the next step 
that needs to be addressed in this implementation. 



APPENDIX A: DETAILED IMPLEMENTATION OF ERROR CORRECTION AND ENCODED STATE 

PREPARATION 



Here we present a pictorial representation of each step required to perform both error correction and encoded state 
preparations. We have already shown a couple of snapshots of the error correction procedure in Figure [6l Here we 
present the complete sequence. Data qubits are represented by dl, d2, etc., and ancilla qubits are noted as al, a2, 
etc. The dummy qubits used to prepare and mesaure the ancilla as well as for qubit transport, are represented by O. 
The preparation of ancilla qubits is noted by Px,z{aj), which represents the preparation of the qubit in the eigenstate 
of either X or Z, with eigenvalue 1. The single- headed arrows represent CNOT gates, pointing from control to target 
qubits; the double-headed arrows represent SWAP gates; and qubit measurements in the X ot Z bases are represented 
by Mx,z{)- 



1. Error correction 



Time step: 



O 

o 
o 
o 
o 
o 
o 



o 

dl 
o 

d4 

o 

dl 
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o 
o 
o 
o 
o 
o 
o 



o 

d2 

o 

d5 
O 
d8 
O 



o 
o 
o 
o 
o 
o 
o 



o 

d3 
O 
d6 
O 
d9 
O 



O 
O 
O 
O 
O 
O 
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Time step: 1 

O O Pz{al) Px{a2) O O O 

O dl O d2 O d3 O 

O O Pz{a4) Px{a5) O O Px(al6) 

Pz(all) d4 O d5 O d6 Pz(al7) 

Px{al2) O Pz{aU) O O O O 

O dl Px(al5) di O dQ O 

O O O Px{a8) Pz{a9) O O 
Time step: 2 

O O al ^ a2 Pz{a3) O O 

O dl O d2 O d3 O 

Px(alO) O a4 ^ a5 Pz{a6) O ol6 

i 

all dA Px(al3) d5 O d6 al7 

T 

al2 O aU O O O Px{al8) 

T 

O dl al5 d8 O c(9 O 

O O Pz{al) a8 ^ a9 O O 
Time step: 3 

O al ^ O a2 ^ aZ O O 

O dl O d2 O d3 al6 

t 

alO a4 ^ O a5 ^ a6 O O 
i 

all d4 al3 d5 O d6 all 

I T 

O O aU O O O al8 

t 

al2 dl al5 ^ d8 O d9 O 

O O al ^ a8 O ^ a9 O 



Time step: 4 
O 

alO 

t 
O 

all 

O 

al2 
O 

Time step: 5 
O 

alO - 

O 
all 

O 

MxiaU) 
O 

Time step: 6 
O 

MxialO) 
O 

Mx{all) 
O 
O 

o 
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al O a2 

T T 

dl O d2 

a4 al3 a5 

T t T 

d4 O d5 
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al4 



d7 Mx(al5) 
a7 ^ O 
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d4 O d5 
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O O aU 

d7 O d8 
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al O Mz{a8) 
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d4 O d5 

O O Mx{ali) 

dl d8 
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O a3 O 
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O dS Mx{al6) 

O a6 O 
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O O O 

O d9 ^ al8 
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O Mz{a3) O 
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O Mz{a6) O 

O d6 O 

O O O 

O d9 Mx{al8) 

O O O 



Time step: 7 
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2. Preparation of 0) 
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Time step: 3 

O al alO a2 all a3 al2 

O a4 al3 a5 al4 a6 al5 

O a7 <- al6 a8 ^ al7 a9 ^ al8 

O dl ^ 0,19 d2 0,20 d3 ^ a21 

O d4 a22 d5 a23 d6 a24 

O d7 al9 dS a20 d9 a21 

o o o o o o o 

Time step: 4 

O al alO a2 <- oil a3 <- ol2 

O a4 <— al3 a5 <— ol4 a6 <— al5 

O Mz{al) al6 Mz(o8) ol7 Mz(a9) al8 

O dl Afz(al9) d2 Mz(o20) d3 Afz(a21) 

O d4 ^ a22 d5 o23 d6 ^ a24 

O d7 ^ al9 d8 ^ o20 d9 -> a21 

o o o o o o o 

Time step: 5 

O Mz(al) alO Afz(o2) oil Mz(a3) al2 

O Mz(a4) al3 Mz(o5) ol4 Mz(a6) al5 

O dl O d2 O d3 O 
$$$$$$ 

O O al6 O al7 O al8 

O d4 Mz(o22) d5 Mz (o23) d6 Afz(ffl24) 

O d7 Mz(ol9) d8 Mz (o20) d9 Mz(a21) 

O O O O O O O 
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Time step: 6 

O O alO O all O al2 

O dl O d2 O d3 O 

$$$$$$ 

O O al3 O aU O alb 

O d4 O d5 O d6 O 

$ $ $ tt t tt 

O O al6 O all O al8 

O dl O dS O d^ O 

O O O O O O O 

Time step: 7 

o o o o o o o 

t t t 

O dl alQ d2 all rf3 al2 

O O O O O O O 

t t t 

O dA al3 d5 al4 d6 ol5 

O O O O O O O 

$ $ $ 

O dl al6 rf8 al7 rf9 al8 

O O O O O O O 

Time step: 8 

o o o o o o o 

O dl ^ alO d2 ^ all d3 ^ al2 

O O O O O O O 

O d4 ^ al3 d5 ^ al4 d6 ^ al5 

O O O O O O O 

O d7 ^ aid d8 ^ all d9 ^ al8 

O O O O O O O 



16 



Time step: 9 

o o o o o o o 

O dl Mx{alO) d2 Mx(all) d3 Mx{al2) 

o o o o o o o 

O d4: Mx{al3) d5 Mx{aU) d6 Mx{al5) 

o o o o o o o 

O d7 Mx{al6) d8 Mx{al7) d9 Mx{al8) 

o o o o o o o 
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